Provably Efficient Reinforcement Learning with Linear Function Approximation
نویسندگان
چکیده
Modern reinforcement learning (RL) is commonly applied to practical problems with an enormous number of states, where function approximation must be deployed approximate either the value or policy. The introduction raises a fundamental set challenges involving computational and statistical efficiency, especially given need manage exploration/exploitation trade-off. As result, core RL question remains open: how can we design provably efficient algorithms that incorporate approximation? This persists even in basic setting linear dynamics rewards, for which only needed. paper presents first provable algorithm both polynomial run time sample complexity this setting, without requiring “simulator” additional assumptions. Concretely, prove optimistic modification least-squares iteration—a classical frequently studied setting—achieves [Formula: see text] regret, d ambient dimension feature space, H length each episode, T total steps. Importantly, such regret independent states actions. Funding: work was supported by Defense Advanced Research Projects Agency program on Lifelong Learning Machines.
منابع مشابه
Convergent Combinations of Reinforcement Learning with Linear Function Approximation
Convergence for iterative reinforcement learning algorithms like TD(O) depends on the sampling strategy for the transitions. However, in practical applications it is convenient to take transition data from arbitrary sources without losing convergence. In this paper we investigate the problem of repeated synchronous updates based on a fixed set of transitions. Our main theorem yields sufficient ...
متن کاملOptimality of Reinforcement Learning Algorithms with Linear Function Approximation
There are several reinforcement learning algorithms that yield approximate solutions for the problem of policy evaluation when the value function is represented with a linear function approximator. In this paper we show that each of the solutions is optimal with respect to a specific objective function. Moreover, we characterise the different solutions as images of the optimal exact value funct...
متن کاملReinforcement Learning with Linear Function Approximation and LQ control Converges
Reinforcement learning is commonly used with function approximation. However, very few positive results are known about the convergence of function approximation based RL control algorithms. In this paper we show that TD(0) and Sarsa(0) with linear function approximation is convergent for a simple class of problems, where the system is linear and the costs are quadratic (the LQ control problem)...
متن کاملResidual Algorithms: Reinforcement Learning with Function Approximation
A number of reinforcement learning algorithms have been developed that are guaranteed to converge to the optimal solution when used with lookup tables. It is shown, however, that these algorithms can easily become unstable when implemented directly with a general function-approximation system, such as a sigmoidal multilayer perceptron, a radial-basisfunction system, a memory-based learning syst...
متن کاملSample-Efficient Evolutionary Function Approximation for Reinforcement Learning
Reinforcement learning problems are commonly tackled with temporal difference methods, which attempt to estimate the agent’s optimal value function. In most real-world problems, learning this value function requires a function approximator, which maps state-action pairs to values via a concise, parameterized function. In practice, the success of function approximators depends on the ability of ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Mathematics of Operations Research
سال: 2023
ISSN: ['0364-765X', '1526-5471']
DOI: https://doi.org/10.1287/moor.2022.1309